Querying Infinispan - Infinispan 5.1

The infinispan-query module

This module adds querying capabilities to Infinispan. It uses Hibernate Search and Apache Lucene to index and search objects in the cache. It allows users to obtain objects within the cache without needing to know the keys to each object that they want to obtain, so you can now search your objects basing on some of it's properties, for example to retrieve all red cars (exact metadata match), or all books about a specific topic (full text search and relevance scoring).

Usage with Infinispan 5

Indexing must be enabled in the configuration (as explained in the next paragraph); then you interact with the Search capabilities via a SearchManager which exposes all needed functionality.

Simple example

We're going to store Book instances in Infinispan; each Book will be defined as in the following example; we have to choose which properties are indexed, and for each property we can optionally choose advanced indexing options using the annotations defined in the Hibernate Search project.

// example values stored in the cache and indexed:
import org.hibernate.search.annotations.*;

//to be indexed the object needs both @Indexed and @ProvidedId annotations:
@Indexed @ProvidedId
public class Book {
   @Field String title;
   @Field String description;
   @Field @DateBridge(resolution=Resolution.YEAR) Date publicationYear;
   @IndexedEmbedded Set<Author> authors = new HashSet<Author>();
}

public class Author {
   @Field String name;
   @Field String surname;
   // hashCode() and equals() omitted
}

Now assuming we stored several Book instances in our Infinispan Cache, we can search them for any matching field as in the following example.

// get the search manager from the cache:
SearchManager searchManager = org.infinispan.query.Search.getSearchManager(cache);

// create any standard Lucene query, via Lucene's QueryParser or any other means:
org.apache.lucene.search.Query fullTextQuery = //any Apache Lucene Query

// convert the Lucene query to a CacheQuery:
CacheQuery cacheQuery = searchManager.getQuery( fullTextQuery );

// get the results:
List<Object> found = cacheQuery.list();

A Lucene Query is often created by parsing a query in text format such as "title:infinispan AND authors.name:sanne", or by using the query builder provided by Hibernate Search.

// get the search manager from the cache:
SearchManager searchManager = org.infinispan.query.Search.getSearchManager( cache );

// you could make the queries via Lucene APIs, or use some helpers:
QueryBuilder queryBuilder = searchManager.buildQueryBuilderForClass( Book.class ).get();

// the queryBuilder has a nice fluent API which guides you through all options.
// this has some knowledge about your object, for example which Analyzers
// need to be applied, but the output is a failry standard Lucene Query.
org.apache.lucene.search.Query luceneQuery = queryBuilder.phrase()
                  .onField( "description" )
                  .andField( "title" )
                  .sentence( "a book on highly scalable query engines" )
                  .createQuery();

// the query API itself accepts any Lucene Query, and on top of that
// you can restrict the result to selected class types:
CacheQuery query = searchManager.getQuery( luceneQuery, Book.class );

// and there are your results!
List<Book> objectList = query.list();

for ( Book book : objectList ) {
      System.out.println( book.getTitle() );
}

A part from list() you have the option for streaming results, or use pagination.

This barely scratches the surface of all what is possible to do: see the Hibernate Search reference documentation to learn about sorting, numeric fields, declarative filters, caching filters, complex object graph indexing, custom types and the powerfull faceting search API.

Notable differences with Hibernate Search

Using @DocumentId to mark a field as identifier is not supported; instead all @Indexed objects should also be marked with @ProvidedId : Infinispan will provide the identifier, which is the key used to store each value in the cache.

Configuration via XML

To enable indexing via XML, you need to add the <indexing ... /> element to your cache configuration, and optionally pass additional properties to the embedded Hibernate Search engine:

<?xml version="1.0" encoding="UTF-8"?>
<infinispan
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xsi:schemaLocation="urn:infinispan:config:5.1 http://www.infinispan.org/schemas/infinispan-config-5.1.xsd"
      xmlns="urn:infinispan:config:5.1">
   <default>
      <indexing enabled="true" indexLocalOnly="true">
         <properties>
            <property name="hibernate.search.default.directory_provider" value="ram" />
         </properties>
      </indexing>
   </default>
</infinispan>

In this example the index is stored in memory, so when this nodes is shutdown the index is lost: good for a quick demo, but in real world cases you'll want to use the default (store on filesystem) or store the index in Infinispan as well. For the complete reference of properties to define, refer to the Hibernate Search documentation.

Using programmatic configuration and index mapping

In the following example we start Infinispan programmatically, avoiding XML configuration files, and also map an object Author which is to be stored in the grid and made searchable on two properties but without annotating the class.

SearchMapping mapping = new SearchMapping();
mapping.entity(Author.class).indexed().providedId()
      .property("name", ElementType.METHOD).field()
      .property("surname", ElementType.METHOD).field();

Properties properties = new Properties();
properties.put(org.hibernate.search.Environment.MODEL_MAPPING, mapping);
properties.put("hibernate.search.[other options]", "[...]");

Configuration infinispanConfiguration = new ConfigurationBuilder()
      .indexing()
         .enable()
         .indexLocalOnly(true)
         .withProperties(properties)
      .build();

DefaultCacheManager cacheManager = new DefaultCacheManager(infinispanConfiguration);

Cache<Long, Author> cache = cacheManager.getCache();
SearchManager sm = Search.getSearchManager(cache);

Author author = new Author(1, "Manik", "Surtani");
cache.put(author.getId(), author);

QueryBuilder qb = sm.buildQueryBuilderForClass(Author.class).get();
Query q = qb.keyword().onField("name").matching("Manik").createQuery();
CacheQuery cq = sm.getQuery(q, Author.class);
Assert.assertEquals(cq.getResultSize(), 1);

Cache modes and managing indexes

Index management is currently controlled by the Configuration.setIndexLocalOnly() setter, or the <indexing indexLocalOnly="true" /> XML element. If you set this to true, only modifications made locally on each node are considered in indexing. Otherwise, remote changes are considered too.

Regarding actually configuring a Lucene directory, please refer to the Hibernate Search documentation on how to pass in the appropriate Lucene configuration via the Properties object passed to QueryHelper.

LOCAL

In local mode, you may use any Lucene Directory implementation. And it doesn't matter what you set indexLocalOnly to.

REPLICATION

In replication mode, each node can have it's own local copy of the index. So indexes can either be stored locally on each node (RAMDirectory, FSDirectory, etc) but you need to set indexLocalOnly to false , so that each node will apply needed updates it receives from other nodes in addition to the updates started locally. Any Directory implementation can be used, but you have to make sure that when a new node is started it receives an up to date copy of the index; typically rsync is well suited for this task, but being an external operation you might end up with a slightly out-of-sync index, especially if updates are very frequent.

Alternately, if you use some form of shared storage for indexes (see Sharing the Index ), you then have to set indexLocalOnly to true so that each node will apply only the changes originated locally; in this case there's no risk in having an out-of-sync index, but to avoid write contention on the index you should make sure that a single node is "in charge" of updating the index. Again, the Hibernate Search reference documentation describes means to use a JMS queue or JGroups to send indexing tasks to a master node.

The diagram below shows a replicated deployment, in which each node has a local index.

images/author/download/attachments/18645131/QueryingInfinispan-REPLonly.png

DISTRIBUTION and INVALIDATION

For these 2 cache modes, you need to use a shared index and set indexLocalOnly to true. In future, we will be able to deal with truly distributed queries, but that would be after ISPN-200.

The diagram below shows a deployment with a shared index. Note that while not mandatory, a shared index can be used for replicated (vs. distributed) caches as well.

images/author/download/attachments/18645131/QueryingInfinispan-DISTINVALandREPL.png

Sharing the Index

The most simple way to share an index is to use some form of shared storage for the indexes, like an FSDirectory on a shared disk; however this form is problematic as the FSDirectory relies on specific locking semantics which are often incompletely implemented on network filesystems, or not reliable enough; if you go for this approach make sure to search for potential problems on the Lucene mailing lists for other experiences and workarounds. Good luck, test well.

There are many alternative Directory implementations you can find, one of the most suited approaches when working with Infinispan is of course to store the index in an Infinispan cache: have a look at the InfinispanDirectoryProvider, as all Infinispan based layers it can be combined with persistent CacheLoaders to keep the index on a shared filesystem withouth the locking issues, or alternatively in a database, cloud storage, or any other CacheLoader implementation; you could backup the index in the same store used to backup your values.

For full documentation on clustering the Lucene engine, refer to the Hibernate Search documentation to properly configure it clustered.

Clustering the Index in Infinispan

Again the configuration details are in the Hibernate Search reference, in particular in the infinispan-directories section. This backend will by default start a secondary Infinispan CacheManager, and optionally take another Infinispan configuration file: don't reuse the same configuration or you will start grids recursively!
It is currently not possible to share the same CacheManager.